Automated Clustering and Assembly of Large EST Collections
نویسندگان
چکیده
The availability of large EST (Expressed Sequence Tag) databases has led to a revolution in the way new genes are cloned. Difficulties arise, however, due to high error rates and redundancy of raw EST data. For these reasons, one of the first tasks performed by a scientist investigating any EST of interest is to gather contiguous ESTs and assemble them into a larger virtual cDNA. The REX (Recursive EST eXtender) algorithm described in this paper completely automates this process by finding ESTs that can be clustered on the basis of overlapping bases, and then assembling the contigs into a consensus sequence. By combining the clustering and assembly steps, REX can quickly generate assemblies from EST databases that are frequently updated without having to preprocess the data. A consensus assembly method is used to correct miscalled bases and remove indel errors. A unique feature of this method is that it addresses the issues of splice variants and unspliced cDNA data. Since REX is a fast greedy algorithm, it can address the problem of generating a database of assembled sequences from very large collections of EST data. A procedure is described for creating and maintaining an Assembled Consensus EST database (ACE) that is useful for characterizing the large body of data that exists in EST databases.
منابع مشابه
مرور مؤثر نتایج جستجوی تصاویر با تلخیص بصری و متنوع از طریق خوشهبندی
With unprecedented growth in production of digital images and use of multimedia references, requirement of image and subject search has been increased. Systematic processing of this information is a basic prerequisite for effective analysis, organization and management of it. Likewise, large collections of images have been made available on the Web and many search engines have provided the poss...
متن کاملdCAS: a desktop application for cDNA sequence annotation
MOTIVATION Understanding gene regulation and expression is the key to the advancement of biology. EST sequence assembly and analysis provide unique benefits in this regard. We have developed a standalone application, dCAS (Desktop cDNA Annotation System), which performs automated EST cleaning, clustering, assembly and annotation on a desktop computer. Compared with other available tools, dCAS p...
متن کاملMassively parallel expressed sequence tag clustering
Expressed Sequence Tag (EST) sequencing is a highly efficient technique that samples expressed genes required for most cellular functions. While this is a well-studied problem and many software tools have been developed, large-scale EST clustering has previously been pursued through incremental approaches, a pipeline of programs and manual efforts to achieve a modest degree of parallelism. Here...
متن کاملClustering of Short Read Sequences for de novo Transcriptome Assembly
Given the importance of transcriptome analysis in various biological studies and considering thevast amount of whole transcriptome sequencing data, it seems necessary to develop analgorithm to assemble transcriptome data. In this study we propose an algorithm fortranscriptome assembly in the absence of a reference genome. First, the contiguous sequencesare generated using de Bruijn graph with d...
متن کاملProcess Capability Studies in an Automated Flexible Assembly Process: A Case Study in an Automotive Industry
Statistical Process Control (SPC) methods can significantly increase organizational efficiency if appropriately used. The primary goal of process capability studies is to obtain critical information about processes to render them even more effective. This paper proposes a comprehensive framework for proper implementation of SPC studies, including the design of the sampling procedure and interva...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Proceedings. International Conference on Intelligent Systems for Molecular Biology
دوره 6 شماره
صفحات -
تاریخ انتشار 1998